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Abstract 

Mathematical modeling is used as a Systems Biology tool to answer biological questions, and more precisely, to 
validate a network that describes biological observations and predict the effect of perturbations. This article presents 
an algorithm for modeling biological networks in a discrete framework with continuous time. 

Background: There exist two major types of mathematical modeling approaches: (1) quantitative modeling, 
representing various chemical species concentrations by real numbers, mainly based on differential equations and 
chemical kinetics formalism; (2) and qualitative modeling, representing chemical species concentrations or activities 
by a finite set of discrete values. Both approaches answer particular (and often different) biological questions. 
Qualitative modeling approach permits a simple and less detailed description of the biological systems, efficiently 
describes stable state identification but remains inconvenient in describing the transient kinetics leading to these 
states. In this context, time is represented by discrete steps. Quantitative modeling, on the other hand, can describe 
more accurately the dynamical behavior of biological processes as it follows the evolution of concentration or 
activities of chemical species as a function of time, but requires an important amount of information on the 
parameters difficult to find in the literature. 

Results: Here, we propose a modeling framework based on a qualitative approach that is intrinsically continuous in 
time. The algorithm presented in this article fills the gap between qualitative and quantitative modeling. It is based on 
continuous time Markov process applied on a Boolean state space. In order to describe the temporal evolution of the 
biological process we wish to model, we explicitly specify the transition rates for each node. For that purpose, we built 
a language that can be seen as a generalization of Boolean equations. Mathematically, this approach can be translated 
in a set of ordinary differential equations on probability distributions. We developed a C++ software, MaBoSS, that is 
able to simulate such a system by applying Kinetic Monte-Carlo (or Gillespie algorithm) on the Boolean state space. 
This software, parallelized and optimized, computes the temporal evolution of probability distributions and estimates 
stationary distributions. 

Conclusions: Applications of the Boolean Kinetic Monte-Carlo are demonstrated for three qualitative models: a toy 
model, a published model of p53/Mdm2 interaction and a published model of the mammalian cell cycle. Our 
approach allows to describe kinetic phenomena which were difficult to handle in the original models. In particular, 
transient effects are represented by time dependent probability distributions, interpretable in terms of cell populations. 
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Background 

Mathematical models of signaling pathways are tools that 
answer biological questions. The most commonly used 
mathematical formalisms to answer these questions are 
ordinary differential equations (ODEs) and Boolean mod- 
eling. 

Ordinary differential equations (ODEs) have been 
widely utilized to model signaling pathways. It is the most 
natural formalism for translating detailed reaction net- 
works into a mathematical model. Indeed, equations can 
be directly derived using mass action laws, Michaelis- 
Menten kinetics or Hill functions for each reaction 
according to the observed behaviors. This framework 
has limitations, though. The first one concerns the dif- 
ficulty to assign values to the kinetic parameters of the 
model. Ideally, these parameters would be extracted from 
experimental data. However, they are often chosen by 
the modeler so as to fit qualitatively the expected pheno- 
types. The second limitation concerns the cell population 
heterogeneity. In this case, ODEs are no longer appropri- 
ate since the approach is deterministic and thus focuses 
on the average behavior. To include non-determinism, an 
ODE model needs to be transformed into a stochastic 
chemical model. In this formalism, a master equation is 
written on the probabilities of the number of molecules for 
each species. In the translation process, the same param- 
eters used in ODEs (more particularly in ODEs written 
with mass action law) can be used in the master equation, 
but in this case, the number of initial conditions explodes 
along with the computation time. 

Boolean (or logical) formalism is another formalism 
used to model signaling pathways where genes/proteins 
are parameterized by Os and Is only. It is the most nat- 
ural formalism to translate an influence network into a 
mathematical model. In such networks, each node corre- 
sponds to a species and each arrow to an interaction or an 
influence (positive or negative). In a Boolean model, a log- 
ical rule linking the inputs is assigned to each node. As a 
result, there are no real parameter values to adjust besides 
choosing the appropriate logical rules that best describe 
the system. In this paper, we will refer to a state in which 
each node of the influence network has a Boolean value 
as a network state, and the set of all possible transitions 
between the network states as a transition graph. There 
are two types of transition graphs, one deduced from the 
synchronous update strategy [1], for which all the nodes 
that can be updated are updated in one transition, and 
another one deduced from the asynchronous update strat- 
egy [2], for which only one node, of all the possible nodes, 
is updated in one transition. In the Boolean formalism, 
each transition can be interpreted as a "time" step, though 
this "time" does not characterize real biological time but 
rather an event. Stochasticity is an important aspect when 
studying cell populations. In Boolean framework, it can 



be applied: on nodes (by randomly flipping a node state 
[3,4]), on the logical rules (by allowing to change an AND 
gate into an OR gate [5]), and on the update rules (by 
defining the probability and the priority of changing one 
particular Boolean value before others in an asynchronous 
strategy [6] or by adding noise to the whole system in a 
synchronous strategy [7]). One of the main drawbacks of 
the Boolean approach is the explosion of solutions. In an 
asynchronous update strategy, the size of the transition 
graph can reach 2 #nodes . 

Both logical and continuous frameworks have advan- 
tages and disadvantages above-mentioned. We propose 
here to combine some of the advantages of both 
approaches in an algorithm that we call the "Boolean 
Kinetic Monte-Carlo" algorithm (BKMC). It consists of 
a natural generalization of the asynchronous Boolean 
dynamics [2], with a direct probabilistic interpretation. In 
BKMC framework, the dynamics is parameterized by a 
biological time and the order of update is noisy, which is 
less strict than priority classes introduced in GINsim [8]. 
A BKMC model is specified by logical rules as in regular 
Boolean models but with a more precise information: a 
numerical rate is added for each transition of each node. 

BKMC is not intended to replace existing tools but 
rather to complement them. It is best suited to model 
signaling pathways in the following cases: 

• The model is based on an influence network, because 
BKMC is a generalization of the asynchronous 
Boolean dynamics. See "Examples" section. Note that 
this is a common requirement for most of Boolean 
software. 

• The model describes processes for which information 
about the duration of a biological process is known, 
because in BKMC, time is parameterized by a real 
number. This is typically the case when studying 
developmental biology, where animal models provide 
time changes of gene/protein activities [9]. 

• The model describes heterogeneous cell population 
behavior, because BKMC has a probabilistic 
interpretation. For example, modeling heterogeneous 
cell population can help understand tissue formation 
based on cell differentiation [10]. 

• The model can contain many nodes (up to 64 in the 
present implementation), because BKMC is a 
simulation algorithm that converges fast. This can be 
useful for big models that have already been modeled 
with a discrete time Boolean method [11], in order to 
obtain a finer description of transient effects (see 
webpage for examples of published models: https:// 
maboss.curie.fr). 

Previous published works have also introduced a contin- 
uous time approach in the Boolean framework([12-18]). 
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In this article, we will first review some of these works and 
present BKMC algorithm. We will then describe the C++ 
software, MaBoSS, developed to implement BKMC algo- 
rithm and finally illustrate its use with three examples, a 
toy model, a published model of p53-MDM2 interaction 
and a published model of the mammalian cell cycle. 

All abbreviations, definitions, algorithms and estimates 
used in this article can be found in Additional file 1. 
Throughout the article, all terms that are italicized are 
defined in the Additional file 1, "Definitions" 

Results and discussion 

BKMC for continuous time Boolean model 
Continuous time in Boolean modeling: past and present 

In Boolean approaches for modeling networks, the state 
of each node of the network is defined by a Boolean 
value (node state) and the network state by the set of 
node states. Any dynamics in the transition graph is rep- 
resented by sequences of network states. A node state is 
based on the sign of the input arrows and the logic that 
links them. The dynamics can be deterministic in the case 
of synchronized update [1], or non-deterministic in the 
case of asynchronized update [2] or probabilistic Boolean 
networks [7]. 

The difficulty to interpret the dynamics in terms of bio- 
logical time has led to several works that have generalized 
Boolean approaches. These approaches can be divided 
in two classes that we call explicit and implicit time for 
discrete steps. 

The explicit time for discrete steps consists of adding a 
real parameter to each node state. These parameters cor- 
respond to the time associated to each node state before 
it flips to another one ([12,13]). Because data about these 
time lengths are difficult to extract from experimental 
studies, some works have included noise in the definition 
of these parameters [18]. The drawback of this method 
is that the computation of the Boolean model becomes 
sensitive to both the type of noise and the initial con- 
ditions. As a result, these time parameters become new 
parameters that need to be tuned carefully and thus add 
complexity to the modeling. 

The implicit time for discrete steps consists of adding 
a probability to each transition of the transition graph in 
the case of non-deterministic transitions (asynchronous 
case). It is argued that these probabilities could be inter- 
preted as specifying the duration of a biological process. 
As an illustration, let us assume a small network of two 
nodes, A and B. At time t, A and B are inactive: [AB] = [00]. 
In the transition graph, there exist two possible transitions 
at t+1: [00] [01] and [00] [10]. If the first transition 
has a significant higher probability than the second one, 
then we can conclude that B will have a higher tendency 
to activate before A. Therefore, it is equivalent to say that 
the activation of B is faster than the activation of A. Thus, 



in this case, the notion of time is implicitly modeled by 
setting probability transitions. In particular, priority rules, 
in the asynchronous strategy, consist of putting some of 
these probabilities to zero [6]. In our example, if B is faster 
than A then the probability of the transition [00] [10] 
is zero. As a result, the prioritized nodes always activate 
before the others. From a different perspective but keep- 
ing the same idea, Vahedi and colleagues [14] have set up 
a method to deduce explicitly these probabilities from the 
duration of each discrete step. With the implementation 
of implicit time in a Boolean model, the dynamics remains 
difficult to interpret in terms of biological time. 

As an alternative to these approaches, we propose 
BKMC algorithm. 

Properties of BKMC algorithm 

BKMC algorithm was built such as to meet the following 
principles: 

• The state of each node is given by a Boolean number 
(0 or 1), referred to as node state; 

• The state of the network is given by the set of node 
states, referred to as network state; 

• The update of a node state is based on the signs 
linking the incoming arrows of this node and the 
logic; 

• Time is represented by a real number; 

• Evolution is stochastic. 

We choose to describe the time evolution of network 
states by a Markov process with continuous time, applied 
to the asynchronous transition graph. Therefore, the 
dynamics is defined by transition rates inserted in a mas- 
ter equation (see Additional file 1, "Basic information on 
Markov process" section 1.1). 

Markov process for Boolean model 

Consider a network of n nodes (or agents, that can rep- 
resent any species, i.e. mRNA, proteins, complexes, etc.). 
In a Boolean framework, the network state of the sys- 
tem is described by a vector S of Boolean values, i.e. Si € 
{0, 1}, i = 1, . . . , n where S; is the state of the node i. The 
set of all possible network states, also referred to as the 
network state space, will be called E. 

A stochastic description of the state evolution is repre- 
sented by a stochastic process s : t i— > s{t) defined on 
t e I C K applied on the network state space, where / 
is an interval: for each time te/cR, s(t) represents a 
random variable applied on the network state space. Thus, 
the probability of these random variables is written as: 

P [s(t) = S] e [ 0, 1] for any state S € E 

with P [s(t) = S] =1 (1) 
SeS 
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Notice that for all t, s(t) are not independent, therefore 
P [s(t) = S,s(t') = S'j # P [s(0 = S] P [s(/) = S']. From 
now on, we define P [s(t) = S] as instantaneous probabil- 
ities. Since the instantaneous probabilities do not define 
the full stochastic process, all possible joint probabilities 
should also be defined. 

In order to simplify the stochastic process, Markov 
property is imposed. It can be expressed in the following 
way: "the conditional probabilities in the future, related to 
the present and the past, depend only on the present" (see 
Additional file 1, "Basic information on Markov process", 
section 1.1 for the mathematical definition). The formal 
definition of a Markov process is a stochastic process with 
the Markov property. 

Any Markov process can be defined by (see Van Kampen 
[19], chapter IV): 

1. An initial condition: 

P[s(0) = S] ; VS e E (2) 

2. Conditional probabilities (of a single condition): 

P [s(t) = S\s(t') = S'] ; VS, S' e E ; W, tel;t? <t 

(3) 

Concerning time, two cases can be considered: 

• If time is discrete: t e I = {to, t\, ■ ■ ■ }, it can be shown 
that all possible conditional probabilities are function 
of transition probabilities [20]: 

P [s(ti) = S|s(f,-_i) = S']. In that case, a Markov 
process is often named a Markov chain. 

• If time is continuous: t e I =[ a, b], it can be shown 
that all possible conditional probabilities are function 
of transition rates [19]: /0(s'-»-s)(*) e[0, oo]. 

Notice that a discrete time Markov process can be 
derived from continuous time Markov process, and is 
called a Jump Process with the following transition proba- 
bilities: 



Z.S"eS Ps^S" 

If the transition probabilities or transition rates are time 
independent, the Markov process is called a time inde- 
pendent Markov process. In BKMC, only this case will be 
considered. For a time independent Markov process, the 
transition graph can be defined as follows: a transition 
graph is a graph in E, with an edge between S and S' if and 
only if p s ^ s / > 0 (or P [s(f,-) = S|s(f,-_i) = S'] > 0 if time 
is discrete). 



Asynchronous Boolean dynamics as a discrete time Markov 
process 

Asynchronous Boolean dynamics [2] is widely used in 
Boolean modeling. It can be easily interpreted as a discrete 
time Markov process [21,22] as shown below. 

In the case of asynchronous Boolean dynamics, the sys- 
tem is given by n nodes (or agents), with a set of directed 
arrows linking these nodes and defining a network. For 
each node i, a Boolean logic Bt(S) is specified and depends 
only on the nodes / for which there exists an arrow from 
node j to i {e.g. B\ = S3 AND NOTS4, where S3 and S4 are 
the Boolean values of nodes 3 and 4 respectively, and£i is 
the Boolean logic of node 1). The notion of asynchronous 
transition (AT) can be defined as a pair of network states 
(S, S') e E, written (S S') such that 

S'j = Bj(S) for a given j 

S> = Si for ij^j (4) 

To define a Markov process, the transition probabili- 
ties P [s(t{) = S|s(£;_i) = S'] can be defined: given two 
network states S and S', let y(S) be the number of asyn- 
chronous transitions from S to all possible states S'. Then 

P [s(f,0 = S'|s(f,-_i) = S] = l/y(S) if (S -> S') is an AT 
P [s(fj) = S'|s(f,-_i) = S] = 0 if (S -» S') is not an AT 

(5) 

In this formalism, the asynchronous Boolean dynam- 
ics completely defines a discrete time Markov pro- 
cess when the initial condition is specified. Notice that 
here the transition probabilities are time independent, 
i.e. P [s(ti) = S|5(ti_i) = S'] = P [sfe+i) = S\s(ft) = S']. 
Therefore, the approaches, mentioned in section "Con- 
tinuous time in Boolean modeling: past and present" that 
introduce time implicitly by adding probabilities to each 
transition of the transition graph, can be seen as a gener- 
alization of the definition of y (S). 

Continuous time Markov process as a generalization of 
asynchronous Boolean dynamics 

To transform the discrete time Markov process described 
above in a continuous time Markov process, tran- 
sition probabilities should be replaced by transition 
rates P(s->-s') - ^ n tnat case > conditional probabilities are 
computed by solving a master equation (equation 2 in 
Additional file 1, "Basic information on Markov process" 
section 1.1). We present below the corresponding numer- 
ical algorithm, the Kinetic Monte-Carlo algorithm [23]. 

Because we want a generalization of the asynchronous 
Boolean dynamics, transition rates P(s->s') are non-zero 
only if S and S' differ by only one node. In that case, 
each Boolean logic Bi(S) is replaced by two functions 
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R t (S) € [ 0, oo [. The transition rates are denned as 

follows: if i is the node that differs from S and S', then 

P ( s^s') =< P (S) ifSj = 0 

P(S ^ s , ) =Rf™ n (S)ifS i = l (6) 

where 7?" p corresponds to the activation rate of node i, 
and Rjovm corresponds to the inactivation rate of node i. 
Therefore, the continuous Markov process is completely 
defined by all these 7? u P /down and an initial condition. 

Asymptotic behavior of continuous time Markov process 

In the case of continuous time Markov process, instan- 
taneous probabilities always converge to a stationary dis- 
tribution (see Additional file 1, "Basic information on 
Markov process" corollary 2, section 1.2). A station- 
ary distribution of a given Markov process corresponds 
to the set of instantaneous probabilities of a stationary 
Markov process which has the same transition probabil- 
ities (or transition rates) as the given discrete (or contin- 
uous) time Markov process. A stationary Markov process 
has the following property: for every joint probability 
P[s(fi) = S (1 \s(t 2 ) = S (2) ,...] andVr: 

P[i(tl)=S (1) ,5(fe)=S (1) ,...] 

= P [s(h + r) = S (1) ,s(t 2 + r) = S (1) , . . .] (7) 

Notice that instantaneous probabilities P [s(t) = S] of a 
stationary stochastic process are time independent. 

The asymptotic behavior of a continuous time Markov 
process can be detailed by using the concept of indecom- 
posable stationary distributions: indecomposable station- 
ary distributions are stationary distributions that cannot 
be expressed as a linear combination of different sta- 
tionary distributions. A linear combination of stationary 
distributions is also a stationary distribution, since instan- 
taneous probabilities are solutions of a master equation 
which is linear (see Additional file 1, "Basic information 
on Markov process" equation 2, section 1.1). Therefore, a 
complete description of the asymptotic behavior is given 
by the linear combination of indecomposable stationary 
distributions to which the Markov process converges. 

Oscillations and cycles 

In order to describe a periodic behavior, the notion of 
cycle and oscillation for a continuous time Markov pro- 
cess is defined precisely. 

A cycle is a loop in the transition graph. This is a topo- 
logical characterization in the transition graph that does 
not depend on the exact value of the transition rates. It 
can be shown that a cycle with no outgoing edges corre- 
sponds to an indecomposable stationary distribution (see 
Additional file 1, "Basic information on Markov process" 
corollary 1, section 1.2). 



The question is then to link the notion of cycle to 
that of periodic behavior of instantaneous probabilities. 
The set of instantaneous probabilities cannot be perfectly 
periodic. They can display a damped oscillating behavior, 
or none at all (see Additional file 1, "Basic information 
on Markov process" section 1.3). A damped oscillatory 
Markov process can be formally defined as a continuous 
time process that has at least one instantaneous probabil- 
ity with an infinite number of extrema. 

According to theorems described in Additional file 1 
("Basic information on Markov process", theorems 6-8 and 
Corollary 3, section 1.3), a necessary condition for hav- 
ing damped oscillations is that the transition matrix has at 
least one non-real eigenvalue (see Additional file 1, "Basic 
information on Markov process" equation 4, section 1.1). 
In that case, there always exists an initial condition that 
produces damped oscillations. For the transition matrix 
to have a non-real eigenvalue, a Markov process needs to 
have a cycle. However, the reverse is not true: a Markov 
process with a cycle does not necessarily imply the exis- 
tence of a non-real eigenvalue in the transition matrix. In 
the toy model of a single cycle, presented in the "Exam- 
ples" section, non-real eigenvalues may or may not exist, 
according to different values of transition rates. 

BKMC: Kinetic Monte-Carlo (Gillespie algorithm) applied to 
continuous time asynchronous Boolean Dynamics 

It has been previously stated that a continuous time 
Markov process is completely defined by its initial con- 
dition and its transition rates. For computing any con- 
ditional probability (and any joint probability), a set of 
linear differential equations has to be solved (the mas- 
ter equation). Theoretically, the master equation can be 
solved exactly by computing the exponential of the tran- 
sition matrix (see Additional file 1, "Basic information 
on Markov process" equation 5, section 1.1). However, 
because the size of this transition matrix is 2" x 2", the 
computation soon becomes impossible if n is large. To 
remedy this problem, it is possible to use a simulation 
algorithm that samples the probability space by comput- 
ing time trajectories in the transition graph. 

The Kinetic Monte-Carlo [23] (or Gillespie algorithm 
[24]) is a simple algorithm for exploring the probability 
space of a Markov process defined by a set of transition 
rates. In fact, it can be understood as a formal definition 
of a continuous time Markov process. This algorithm pro- 
duces a set of realizations or stochastic trajectories of the 
Markov process, given a set of uniform random numbers 
in [ 0, 1]. By definition, a trajectory S(£) is a function from a 
time window [ 0, i max ] to X. The set of stochastic trajecto- 
ries represents the given Markov process in the sense that 
these trajectories can be used to compute probabilities. 
A finite set of these trajectories is produced, then, from 
this finite set, probabilities are estimated (as described in 
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"Methods" section). The algorithm is based on an iterative 
step: from a state S at time to (given two uniform random 
numbers), it produces a transition time St and a new state 
S', with the following interpretation: the trajectory S(t) is 
such that S(t) = S for t e [ to, to + St] and S(t 0 + St) = S'. 
Iteration of this step is done until a specified maximum 
time is reached. The initial state of each trajectory is based 
on the (probabilistic) initial condition that also needs to 
be specified. 

The exact iterative procedure is the following. Given S 
and two uniform random numbers u, u! € [ 0, 1]: 

1. Compute the total rate of possible transitions for 
leaving state S: 

Ptot = J2s' P(S-»S')- 

2. Compute the time of the transition: 

St = - log(w)/Aot 

3. Order the possible new states S'®,j = 1 . . . and their 
respective transition rates = p^ s ^ s ,f))y 

4. Compute the new state S' ( ^ such that 

£/=0 Pj < (w'Ptot) < E ; =o Pj ( b y convention, 
p<°> = 0). 

This algorithm will be referred to as Boolean Kinetic 
Monte-Carlo or BKMC. 

Practical use of BKMC, through MaBoSS tool 

Biological data are translated into an influence network 
with logical rules associated to each node of the net- 
work. The value of one node depends on the value of 
the input nodes. For BKMC, another layer of informa- 
tion is provided when compared to the standard defini- 
tion of Boolean models: transition rates are provided for 
all nodes, specifying the rates at which the node turns 
on and off. This refinement conserves the simplicity of 
Boolean description but allows to reproduce more accu- 
rately the observed biological dynamics. The parameters 
do not need to be exact as it is the case for nonlinear ordi- 
nary differential equation models, but they can be used 
to illustrate the relative speed of reactions. We developed 
a software tool, MaBoSS, that applies BKMC algorithm. 
MaBoSS stands for Markov Boolean Stochastic Simulator. 

How to build a mathematical model using MaBoSS 

Once MaBoSS is installed (see webpage for instructions, 
https://maboss.curie.fr), the protocol to follow to simulate 
a model can be described in four steps: 

1. Create the model using MaBoSS language in a file 
(myfile.bnd, for instance): (a) write the logic for each 
node, and (b) assign values to each transition rate. 

2. Create the configuration file (myfile.cfg, for instance) 
to define the simulation parameters. 



3. Run MaBoSS (the order of the arguments does not 
matter): 

MaBoSS -c myfile.cfg -o myfile_out 
myf ile . bnd 

(we assume that MaBoSS is accessible through you 
PATH). 

MaBoSS creates three output files: 

• myf ile_out_proptraj . csv 

This file contains the network state probabilities 
on a time window, the entropy, the transition 
entropy and the Hamming distance distribution 
(see "Methods") 

• myf ile_out_statdist . csv 

This file contains the stationary distribution 
characterization (see "Methods") 

• myfile_out_run.txt 

This file contains a summary of MaBoSS 
simulation run. 

4. Import output csv files in Excel or R and generate 
your graphs. 

Transition rates in MaBoSS 

MaBoSS defines transition rates P(s->.s') by the functions 
i?" p/down (S) (see equations 6). The functions can be writ- 
ten using all Boolean operators (AND, OR, NOT, XOR), 
arithmetic operators (+,-,*,/), comparison operators and 
the conditional operator (?:). Examples of the use of the 
language are given below to illustrate three different cases: 
(1) different speeds for different inputs, (2) buffering effect 
and (3) the translation of discrete variables (with three 
values: 0, 1 and 2) into a Boolean model. 

1. Modeling different speeds for different inputs. 

Suppose that C is activated by A or B, but that B can 
activate C faster than A, and that C is inactivated 
when A and B are absent. In this case, we write: 

node C { 

rate.up = B ? $kb : (A ? $ka : 0.0); 
rate_down = ! (A & B ) ? 1.0 : 0.0; 

} 

When C is off (equal to 0), it is activated by B at a 
speed $kb. If B is absent, then C is activated by A at a 
speed $ka. If both are absent, C is not activated. Note 
that if both A and B are present, because of the way 
the logic is written in this particular case, C is 
activated at the highest speed, the speed $kb. When 
C is on (equal to 1), it is inactivated at a rate equal to 
1 in the absence of both A and B. 
To implement the synergistic effect of A and B, i.e. 
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when both A and B are on, C is activated at a rate 
$kab, then we can write: 

node C { 

rate_up = (A & !B ? $ka : 0.0) + (B 
& !A ? $kb : 0.0) + (A & B ? $kab : 
0.0); 

rate.down = ! (A & B ) ? 1 . 0 : 
0.0; 



2. Modeling buffering effect. 

Suppose that B is activated by A, but that B can 
remain active a long time after A has shut down. For 
that, it is enough to define different speeds of 
activation and inactivation: 

node B { 

rate.up = A ? 2 . 0 : 0.0; 
rate.down = A ? 0.0 : 0.0 01; 

} 

B is activated by A at a rate equal to 2. When A is 
turned off, B is inactivated more slowly at a rate equal 
to 0.001. 

3. Modeling different levels for a given node. 
Suppose that B is activated by A, but if the activity of 
A is maintained, B can reach a second level. For this, 
we define a second node BJh (for "B high") with the 
following rules: 

node B { 

rate.up = A ? 1 . 0 : 0.0; 
rate.down = (A | B_h) ? 0.0 : 1.0; 

} 



node B_h { 

rate.up = (A & B) ? 1 . 0 
rate_down = (A) ? 0 . 0 : 

} 



0.0; 

0; 



In this example, B is separated in two variables: B 
which corresponds to the first level of B and B_h 
which corresponds to the higher level of B. B is 
activated by A at a rate equal to 1. If A disappears 
before B has reached its second level B_h then B is 
turned off at a rate equal to 1. If A is maintained and 
B is active, then B_h is activated at a rate equal to 1. 
When A is turned off, B_h is inactivated at a rate 
equal to 1. 

Simulation parameters in MaBoSS 

To simulate a model in MaBoSS, a set of parameters needs 
to be adjusted (see "Parameter list" in the reference card 
available in the webpage). MaBoSS assigns default values, 
however, they need to be tuned for each model to achieve 



optimal performances: the best balance between the con- 
vergence of estimates and the computation time needs to 
be found. Therefore, several simulations should be run 
with different sets of parameters for best tuning. 

• Internal nodes: node.isJnternal 

As explained in "Methods" (in "Initial conditions and 
outputs"), internal nodes correspond to species that 
are not measured explicitly. Practically, the higher the 
number of internal nodes, the better the convergence 
of the BKMC algorithm. 

• Time window for probabilities: timetick 

This parameter is used to compute estimates of 
network state probabilities (see "Network state 
probabilities on a time window" in "Methods"). A 
time window can be set as the minimum time needed 
for nodes to change their states. This parameter also 
controls the convergence of probability estimates. 
The larger the time window, the better the 
convergence of probability estimates. 

• Maximum time: maxMme 

MaBoSS produces trajectories for a predefined 
amount of time, set by the parameter max_time. If the 
time of the biological process is known, then the 
maximum time parameter can be explicitly set. If the 
time of the biological process is not known, then 
there exists a more empirical way to set the 
maximum time. It is advised to choose a maximum 
time parameter that is slightly bigger than the inverse 
of the smallest transition rate. 
Note that the computing time in MaBoSS is 
proportional to this maximum time. Moreover, the 
choice of the maximum time impacts the stationary 
distribution estimates: a longer maximum time 
increases the quality of these estimates. 

• Number of trajectories: sample -count 

This parameter directly controls the quality of BKMC 
estimation algorithm. Practically, the convergence of 
the estimates increases as the number of trajectories 
is increased. 

• Number of trajectories (statdistMaj Lcount) and 
similarity threshold {statsdist-duster_threshold) for 
stationary distribution estimates 

The statdist-traj .count parameter corresponds to a 
subset of trajectories used only for stationary 
distribution estimates. To avoid explosion of 
computing time, this parameter needs to be lower 
than the number of trajectories (rather than equal to). 
The statsdist-cluster_threshold parameter 
corresponds to the threshold for constructing the 
clusters of stationary distribution estimates. Ideally, it 
should be set to a high value (close to 1). However, if 
the threshold is too high then the clustering 
algorithm might not be efficient. 
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Comparison with biological data 

Each node of the network should account for different 
levels of activity of the corresponding species (mRNA, 
protein, protein complex, etc.). It is possible to have more 
than two levels for one node, as shown in the example 
"Modeling different levels for a given node" 

It is possible to extract the transition rates from exper- 
imental data, using the following property: the rate of a 
given transition is the inverse of the mean time for this 
transition to happen. It should be noticed than BKMC 
is an algorithm based on a linear equation (Additional 
file 1, "Basic information on Markov process" equation 2, 
section 1.1); therefore, small variations of transition rates 
will not affect the qualitative behavior of the model. 

BKMC algorithm provides estimates of the network 
state probabilities over time. These probabilities can be 
interpreted in terms of a cell population. The asymptotic 
behavior of a model, represented by a linear combination 
of indecomposable stationary distributions, can be inter- 
preted as a combination of cell sub-populations. Indeed, 
a sub-population can be defined by network states with 
non-zero probability in the indecomposable stationary 
distribution. Therefore, a cell in a sub-population can only 
evolved in this sub-population (Additional file 1, "Basic 
information on Markov process", corollary 1, section 1.2 
and from the definition of strongly connected component 
with no outgoing edges). 

Comparison ofMaBoSS with other existing tools for 
qualitative modeling 

MaBoSS contributes to the effort of tool development for 
qualitative modeling of biological networks. We propose 
to compare MaBoSS to some existing tools. However, it is 
difficult to compare the performance of these tools since 
each of them achieves different purposes and provides 
different outputs. As an alternative, we recapitulate, in 
Figure 1, the characteristics and implications for each soft- 
ware. Some tools may be more appropriate than others 
according to the type of input, network size and expected 
output. Figure 1 is intended to help the users decide which 
software to use in a practical situation. We consider the 
following tools: GINsim [8], CellNetAnalyzer [25], Bool- 
Net [26], GNA [27], and SQUAD [28]. This list is not 
exhaustive but informs on where MaBoSS stands. 

As an illustration, the third example of the "Exam- 
ples" section below, the mammalian cell cycle, was imple- 
mented in three of the tools presented in Figure 1: 
MaBoSS, GINsim, BoolNet (see Additional file 2 "Model 
of the mammalian cell cycle with GINsim, BoolNet and 
MaBoSS." for details of the results). 

Examples 

We have applied BKMC algorithm to three models of dif- 
ferent sizes. The first one is a toy model illustrating the 



dynamics of a single cycle; the second one is a published 
Boolean model of p53-Mdm2 response to DNA damage 
and illustrates a multi-level case; and the third one is a 
published Boolean model of mammalian cell cycle regu- 
lation. Note that MaBoSS has been used for these three 
examples, but Markov process can be computed directly 
for the two first ones, without our BKMC algorithm 
because these models are small enough (by computing 
exponential of transition matrix, see Additional file 1, 
"Basic information on Markov process" section 1.1), as 
proposed in [16]. BKMC is best suited for larger networks, 
when the network state space is too large to be com- 
puted with standard existing tools (>~ 2 10 ). The first two 
examples were chosen for their simplicity, and because 
they illustrate how global characterizations (entropy and 
transition entropy, see "Entropies" in "Methods") can be 
used. The third example shows the use of BKMC/MaBoSS 
for a more consequent and complex model for which the 
analysis is not obvious. 

For the purpose of this article, we built the transition 
graphs for the first two examples (with GINsim [8]) in 
order to help the reasoning. However, it is important 
to note that BKMC algorithm does not construct the 
transition graph explicitly. 

All input files and results are given in the web- 
page of MaBoSS (https://maboss.curie.fr) with additional 
examples. 

Toy model of a single cycle 

We consider three species, A, B and C, where A is acti- 
vated by C and inhibited by B, B is activated by A and C is 
activated by A or B (Figure 2a). 

The model is defined within the language of MaBoSS by 
a set of logical rules associated to each node (Figure 2b) 
and simulation parameters set for optimal performances 
(Figure 2c). The associated transition graph, generated by 
GINsim, is shown in Figure 3. 

The only stationary distribution is the fixed point 
[ABC] = [000]. We study two cases: when the rate of the 
transition from state [001] to state [000] (corresponding 
to the inactivation of C) is fast and when this rate is slow. 
We will refer to this transition rate as the escape rate. For 
both cases, we plot the time trajectories of the probabili- 
ties of the fixed point [ABC] = [000] and of the probabilities 
of A active [ABC] = [1**] where * can be either 1 or 0, along 
with the trajectories of the entropy and the transition 
entropy. 

In the first case, when the escape rate is fast, we 
set the parameter for the transition to a high value 
(rate.up = 10). In Figure 4, we notice that the probability 
that [ABC] is equal to [000] converges to 1. We can con- 
clude that [ABC] = [000] is a fixed point. In addition, the 
entropy and the transition entropy converge to 0. With 
BKMC, these properties confirm that [ABC] = [000] is a 
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Logical rules and transition 
rates 



Probabilistic logical rules 



Chemical reactions, 
chemical constants 



Chemical reaction parameters 
for ODE modeling, translated 
from Boolean model 



Size of network 



Up to 64 nodes 



<~20 nodes for 

dynamical 

analysis 



Large network (~ 100 
nodes) 



Large network (~100 nodes) 



Very large network (~ 1000 
nodes) for steady states 
identification 



Time dependent 
probabilities 



Transition graph 



State trajectories 
{synchronized update) 



State trajectories (synchronized 
updated, asynchronized update, 
probabilistic synchronized update) 



Qualitative state 
transition graph 



Time trajectories of associated 
to the ODE model 



Linear combination of 
indecomposable stationary 
distributions 



Steady states 



Interpretation 
of results 



Time dependent gene 
activities 



Possible transient 
gene activities 



Transient gene activities 



Possible transient gene activities 



Discretized protein 
concentrations 



Time dependent protein 
concentration at single cell level 



Asymptotic behavior 



Possible 

asymptotic 

behavior 



Asymptotic stable 
behavior 



Possible asymptotic behavior 



Asymptotic behavior 
discretized protein 
concentrations 



Possible asymptotic stable 
behavior 



Cell population 



single cell (cell population in case 
of probabilistic logical rules) 



Type of 
computation 



Exact/simulation 



Exact/simulation 



Environment 



Command line 



Integrated 
software 



Integrated software and 
command line 



Integrated software 



Interface 
characteristics 



Need knowled 
ASCII file (perl, 



je in handling 
etc.) 



User friendly within 
MATLAB environment 



Need knowledge in R 



Possible external 
software 
management 



Need knowledi 



Need 

understanding of 
JAVA libraries 



Need knowledge in 
MATLAB 



Need knowledge in R 



Need UNIX/MS-DOS 
knowledge 



Need understanding of JAVA 
libraries 



I 



Examples of 
outputs 




IT 






Figure 1 Comparison of tools for discrete modeling, biological implication. Comparison table of the following tools 
CellNetAnalyzer, BoolNet, GNA, SQUAD. Technical aspects are provided, along with the inputs/outputs relations between 
row illustrates graphically the typical outputs that can be obtained from each tool. 



MaBoSS, GINsim, 

a model and data. The last 













a 


NodeC 
{ 

rate up=0.0; 

rate down^(NOTA)AND(NOTB))?$escape:0.0; 
} 

Node A 
{ 

rate_up=(CAND(NOTB)) ? $Au : 0.0; 

rate down=B? $Ad : 0.0; 

} 

NodeB 
{ 

rate up=A? $Au : 0.0; 
rate down =A? 0.0 : $Ad; 
} 

b 


$Au=l; 

$Bu=2; 

$Bd=3; 

$Ad=4; 

$escape=10; 

Cistate=l; 

Aistate=l; 

B.istate=l; 

Arefstate=0; 

discrete_time =0; 

usephysrandgen =FALSE; 

seed pseudorandom =100; 

sample_count =500000; 

max_time=5; 

time_tid<=0.01; 

threadcount =4; 

statdist_traj_count =100; 

statdistdusterthreshold =0.9; 

C 




Figure 2 Toy model. Toy model of a single cycle, (a) Influence network, (b) Logical rules and transition rates of the model, (c) Simulation parameters. 
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110 



Figure 3 Transition graph of the toy model. Transition graph for 
the toy model (generated byGINsim).The node states should be read 
as [ABC] = [***]. [ABC]=[100] corresponds to a state in which only A is 
active. The nodes in green belong to a cycle, the node in red is the 
fixed point and the other nodes are in blue. 



fixed point. The peak in the trajectory of the entropy 
(between times 0 and 0.6) corresponds to a set of states 
that are transiently activated before reaching the fixed 
point. 

In the second case, when the escape rate is slow, 
we set the parameter for the transition to a low value 
(rate_down = 10 -5 ). As illustrated in Figure 5, the tran- 
sition entropy is and remains close to zero but the 
entropy does not converge to zero, which is the signa- 
ture of a cyclic stationary distribution (see "Entropies" in 
"Methods"). This corresponds to the cycle [111] — > [Oil] 
— > [001] — > [101] in the transition graph (Figure 3). How- 
ever, as seen in the transition graph, one state in the cycle 
has an outgoing edge that leads to the fixed point (through 
the transition [001] [000] in Figure 3). If the trajectories 
are plotted on a larger time scale (Figure 6), the entropy 
eventually converges to 0 and the trajectory of the fixed 
point converges to 1, which corresponds to the case of fast 
escape rate. Since the value of the transition entropy of 
Figure 5 is not exactly zero, but 10~ 4 , it can be anticipated 
that the cyclic behavior is not stable. We can conclude on 
stable cyclic behaviors only when the transition entropy is 
exactly 0. 

By considering the spectrum of the transition matrix 
(see Additional file 1, "Basic information on Markov pro- 
cess", section 1.1 and proof of theorem 4), it can be proven 
that the model with a slow escape rate is a damped oscil- 
latory process whereas the model with a large escape rate 
is not. As mentioned previously, a cycle in the transition 
graph may or may not lead to an oscillatory behavior. 
Moreover, if the transition entropy seems to converge to 
a small value on a small time scale, and the entropy does 
not, this behavior illustrates the case of a transient cycle in 
the transition graph. 
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Figure 4 MaBoSS outputs of the toy model with fast escape rate. BKMC algorithm is applied to the toy model, with a fast escape rate. Trajectory 
of the network state probabilities [ABC]=[000] and [ABC]=[1**] (where * can be either 0 or 1), the entropy (H) and the transition entropy (77-/) are 
plotted. Because the probability of [ABC]=[000] converges to 1, [ABC]=[000] is a fixed point. The asymptotic behavior of both the entropy and the 
transition entropy is also the signature of a fixed point. 
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Figure 5 MaBoSS outputs of the toy model with slow escape rate. BKMC algorithm is applied to the toy model, with a slow escape rate. 
Trajectory of the network state probabilities [ABC]=[000] and [ABC]=[1**], the entropy (H) and the transition entropy (TH) are plotted. The 
asymptotic behavior of both the entropy and the transition entropy seems to be the signature of a cycle. 



p53-Mdm2 signaling 

We consider a model of p53 response to DNA dam- 
age [18]. p53 interacts with Mdm2, which appears in 
two forms, cytoplasmic and nuclear. On one hand, p53 
upregulates the level of cytoplasmic Mdm2 (Mdm2c), 
which is then transported into the nucleus, and inhibits 
the export of nuclear Mdm2 (Mdm2n). On the other 
hand, nuclear Mdm2 (Mdm2n) facilitates the degrada- 
tion of p53 through ubiquitination. In the model, stress 
regulates the level of DNA damage (Dam), which in 
turn participates in the degradation process of Mdm2 
in the nucleus. p53 inhibits DNA damage signal by pro- 
moting DNA repair. Here, stress is not shown explicitly 
(Figure 7a). 



The model is written in MaBoSS, with two levels of p53 
(Figure 7b), as it is done in Abou-Jaoude et al. [18] with the 
appropriate simulation parameters (Figure 7c). The asso- 
ciated transition graph, also generated by GINsim, is given 
in Figure 8. It shows the existence of two cycles and of a 
fixed point [p53 Mdm2C Mdm2N Dam] = [0010] where 
nuclear Mdm2 is on and the rest is off. 

In order to represent the activity of p53, the trajectories 
of the probabilities of all network states with p53 equal to 1 
and with p53 equal to 2 are plotted (Figure 9, upper panel), 
with the initial condition: [p53 Mdm2C Mdm2N Dam] = 
[0*11] and for the situation when p53 is set to its highest 
value (2 equivalent to p53_h) and thus can promote Mdm2 
cytoplasmic activity. 




Figure 6 MaBoSS outputs of toy model with slow escape rate, large time scale. BKMC algorithm is applied to the toy model, with a slow 
escape rate, plotted on a larger time scale. Trajectory of probabilities ([ABC]=[000] and [ABC]=[1**]), the entropy (H) and the transition entropy (TH) 
are plotted. On a large time scale, the asymptotic behavior of both the entropy and the transition entropy is similar to the case of the fast escape 
rate (Figure 3). 
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node p53 
< 

logic =N0TMdm2N; 

rateup =(@logic ? 1.0 : 0.0)/$tp53ij ; 

ratedown =(((NOT@logjc) AND NOTp53_h) ? 1.0 : 0.0)/$tp53d ; 
node p53_h 
logic =NOTMdm2N; 

rate_up -((©logic AND p53) ? 1.0 : 0.0)/$tp53hu; 
rate down =((@logic? 0.0 : 1.0))/$tp53hd; 



logic =$case_a ? p53_h : p53; 
rateup =(@logic? 1.0 : 0.0)/$tMOi; 
ratedown =(@logic ? 0.0 : 1.0)/$tMCd; 

node Mdm2N 

logjc_p53 =$case_a ? p53 : p53_h; 

rate_up=(((@logic_p53ANDMdrr^CANDDarr^?$l<Mn_pMCD:0.0) + 
((@logic_p53ANDMdrr2CAND(NCfTDarn))? $KMn_pMC: 0.0) + 
((@loojc_p53 AND (NOT Mdm2Q AND Dam) ? $KMn_pD : 0.0) + 
((@logic_p53AND(NOTMdm2QAND(NOTDam))?$KMn_p:0.0) + 
((NOT@logc_p53 ANDMdm2CANDDam) ? $KMn_MCD: 0.0) + 
((NOT@logic_p53 ANDMdm2CAND(NOTDam)) ? $KMn_MC: 0.0) + 
((NOT@logic_p53 ANDNOTMdm2CANDDam) ? $KMn_D: 0.0) + 
((NOT@logic_p53 ANDNOTMdm2CAND(NOTDam)) ? $KMn : 0.0))/$tMNu; 

rate down -(((@logJc_p53ANDMdm2CANDDam)? (l-$KMn_pMCD): 0.0) + 
((@loojc_p53 ANDMdrr2CAND(NOTDam)) ? (l-$KMn_pMQ : 0.0) + 
((@logic_p53 AND (NOT Mdm2Q ANDDam) ? (l-$KMn_pD) : 0.0) + 
((@logic_p53 AND (NOTMdm2Q AND (NOT Dam)) ? (1-$KM n_p) : 0.0) + 
((NOT@logic_p53 ANDMdm2CANDDam) ? (l-SKMn_MCD) : 0.0) + 
((NOT@logic_p53 ANDMdm2CAND(NOTDam)) ? (l-5KMn_MQ : 0.0) + 
((NOT@logic_p53 ANDNOTMdm2CANDDam) ? (l-$KMn_D) : 0.0) + 
((NOT@logic_p53ANDNOTMdm2CAND(NOTDam))?(l-SKMn):0.0))/$tMNd; 

} 

node Dam 
{ 

logic =$case_a ? p53_h : p53; 
rateup =@logjc ? 0.0 : 0.0; 
rate_down =(@logic? 1.0 : 0.0)/$tOd; 

> \ 



$fast -100; 

$ca5e_a -TRUE; 

$KMn=l; 

$KMn_p=0; 

$KMn_MC=l; 

$KMn_pMC=4; 

$KMn_D=0; 

$KMn_pL>0; 

$KMn_MCD=l; 

$KMn_pMCD=l; 

$tp53u=2; 

$tp53d=2; 

$tp53tiu=l; 

$tp53hd=l; 

$tMQj=0.6; 

$tMCd=l; 

$tMNu=0.3; 

$tMNd=l; 

$tDd=5; 

Damistate=TRJJE; 
Mdm2N.istate-TRUE; 
p53.istate=FALSE; 
p53_h.istate -FALSE; 
samplecount =50000; 
max_time=50; 

time_tick= 0.1; discrete_time =0; 
usephysrandgen = FALSE; 
seed_pseudorandom =100; 
p53.refstate=0; 
p53_h.refstate -0; 
threadcount =1; 
statdist_traj_ODunt =100; 
statdist duster threshold -0.9; 



Figure 7 Model of p53 response to DNA damage. Model of p53 response to DNA damage, (a) Influence network, (b) Logical rules and transition 
rates of the model, (c) Simulation parameters. 
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Figure 8 Transition graph of the model of p53 response to DNA damage. Transition graph of the p53 model (generated by GINsim).The node 
states should be read as [p53 Mdm2C Mdm2N Dam] = [****] (where * can be either 0 or 1 }. For instance, [p53 Mdm2C Mdm2N Dam]=[1 000] 
corresponds to a state in which only p53 (at its level 1) is active. The nodes in green and the nodes in light blue belong to two cycles, the node in 
red is the fixed point and the other nodes are in dark blue. 
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Figure 9 MaBoSS outputs of the model of p53 response to DNA 
damage. Trajectories of the network state probabilities of [p53 
Mdm2C Mdm2N Dam] = [1 ***] and of [p53 Mdm2C Mdm2N Dam] = 
[2***], the entropy (H) and the transition entropy (TH) are plotted. 



The qualitative results obtained with MaBoSS are sim- 
ilar to those of Abou-Jaoude and colleagues. However, at 
the level of cell population, some discrepancies appear: in 
Figure 9, no damped oscillations can be seen as opposed to 
Figure 8 of their article. The reason is that, in their compu- 
tations, the noise imposed on time is defined by a square 
distribution on a limited time frame, whereas in BKMC, 
Markovian hypotheses imply that the noise distribution 
is more spread out from 0 to infinity. The consequence 
is that synchronization is lost very fast. Damped oscilla- 
tions could be observed with BKMC with a particular set 
of parameters: fast activation of p53 and slow degradation 
of p53 (results not shown). 

With MaBoSS, we clearly interpret the system as a 
population and not as a single cell. In addition, we can 
simulate different contexts, presented in the initial arti- 
cle as different models, within one single model that 
uses different simulation parameters to account for these 
contexts. 

Note that the existence of transient cycles, as shown in 
the toy model, can be deduced from the trajectory of the 
entropy that is significantly higher than the trajectory of 
the transition entropy (which is non-zero, therefore the 
transient cycles are not stable) (Figure 9, lower panel). 

Mammalian cell cycle 

For the last example, we propose a model of the mam- 
malian cell cycle initially published as on ODE model by 
Novak and Tyson [29] and translated into a Boolean model 
by Faure and colleagues [6]. The latter model encompasses 
10 nodes, which describe the mechanisms controlling the 
activity of the different CDK/cyclin complexes, the main 
actors of cell cycle regulation and the dynamics of entry 
into the cell cycle in presence of growth factors. 



We implement the logical rules of the published model 
in MaBoSS and define two parameter values for the 
transition rates: a slow one (set to 1) and a fast one 
(set to 10). The choice between slow and fast rates for 
each transition is based on the choice made in the pub- 
lished Boolean model: different priority classes were used 
in mixed discrete a/synchronous simulation and corre- 
sponded to the differences in speed of cellular processes 
such as transcription, degradation and protein modifica- 
tion. We could, of course, refine the analysis by setting 
different rates for each transition. The network, the logi- 
cal rules and the simulation parameters can be found on 
the webpage. 

As mentioned before, MaBoSS can provide two types of 
outputs: the probabilities of different network states over 
time (along with the entropy and transition entropy) and 
the indecomposable stationary distributions. 

We consider two biological cases, in the presence of 
growth factors where the cell enters its division cycle and 
in the absence of growth factors where the cell is stuck 
in a Gl-like state (state preceding replication of DNA). 
In the model, the activity of CyclinD (CycD), a Gl-cyclin, 
illustrates the presence of growth factors. In our simula- 
tions, we set an initial condition corresponding to a Gl 
state with two CDK/cyclin inhibitors, p27 and cdhl, on, 
and with CyclinD on in order to account for the external 
growth signal. We plot the trajectories of the probabili- 
ties of all the cyclins A, B and E (Figure 10, upper panel). 
The cyclins' activities exhibit an oscillatory behavior. Each 
oscillation can be interpreted as a cell division cycle. How- 
ever, these oscillations are damped. This can be explained 
by the fact that these probabilities should be interpreted 
at the cell population level and after few cycles, the cells 
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Figure 1 0 MaBoSS outputs of the model of the mammalian cell 
cycle: trajectories of probabilities. BKMC algorithm is applied to the 
mammalian cell cycle model, with an initial condition corresponding 
to a G1 state in the presence of growth factors (CyclinD is on). 
Trajectories of the cyclins probabilities, the entropy (H), transition 
entropy (TH) are plotted. The asymptotic behavior corresponds to the 
first indecomposable stationary distribution identified in Figure 10. 
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become desynchronized. Moreover the trajectories of the 
entropy and the transition entropy exhibit the signature of 
cyclic attractors (Figure 10, lower panel). 

The indecomposable stationary distributions are iden- 
tified by the clustering algorithm of MaBoSS and illus- 
trated in Figure 11. The two clusters in Figure 11a show 
the two types of solutions for random initial conditions: 
one multi-cyclic solution when CyclinD is on, and which 
corresponds to the distribution of network states of the 
asymptotic solution of Figure lib, and one fixed point cor- 
responding to a Gl arrest when CyclinD is off (Figure 11c). 

These two indecomposable stationary distributions cor- 
respond to the two attractors identified by discrete time 
modeling in Faure et al. In the discrete time algorithm, 
the asymptotic behavior is described in terms of attractors 
(sub-parts of the transition graph); in our algorithm, the 
asymptotic behavior is described in terms of network state 
probability distributions. 

Conclusions 

We have presented a new algorithm, Boolean Kinetic 
Monte-Carlo or BKMC, applicable to dynamical simula- 
tion of signaling networks based on continuous time in the 
Boolean framework. BKMC algorithm is a natural gener- 
alization of the asynchronous Boolean dynamics [2], with 



time trajectories that can be interpreted in terms of bio- 
logical time. The variables of the Boolean model represent 
biological species and the parameters represent rates of 
activation or inactivation of these species that, ideally, 
could be deduced from experimental data. 

We applied this algorithm to three different models: a 
toy model that illustrates a simple cyclic behavior, a pub- 
lished model of p53 response to DNA damage, and a 
published model of mammalian cell cycle dynamics. 

This algorithm is provided within a freely available 
software, MaBoSS, that can run BKMC algorithm on 
networks up to 64 nodes in the present version. The 
construction of a model uses a specific language that 
introduces logical rules and transition rates of node acti- 
vation/inactivation in a flexible manner. The software 
provides global and semi-global outputs of the model 
dynamics that can be interpreted as signatures of the 
dynamical behaviors. These interpretations become par- 
ticularly useful when the network state space is too large to 
be handled. The convergence of BKMC algorithm can be 
controlled by tuning some simulation parameters: max- 
imum time of the simulation, number of trajectories, 
length of a time window on which the average of probabil- 
ities is performed, and the threshold for the definition of 
stationary distribution clusters. 




lProb[Cluster#1] 
l Prob[Cluster #2] 




lProb[CycD-cdh1-CycA I Cluster #1] 
iProb[CycD-Cdc20--UboH10--cdh1 I Cluster #1] 

Prob[CycD-E2F--CycE-cdh1 I Cluster #1] 
lProb[CycD-cdh1 I Cluster #1] 

sProb[CycD--Cdc20--UbcH10--cdh1--CycB I Cluster #1] 

Prob[CycD-UbcH10-cdh1 I Cluster #1] 

Prob[CycD-UbcH10-CycA I Cluster #1] 

Prob[CycD-UbcH10-CycA-CycB I Cluster #1] 

Prob[CycD-E2F--cdh1 I Cluster #1] 
■ Prob[CycD-CycA I Cluster #1 ] ^ 




Figure 1 1 MaBoSS outputs of the model of the mammalian cell cycle: stationary distributions. BKMC algorithm is applied to the mammalian 
cell cycle model, with random initial conditions. Results of the clustering algorithm that associates a cluster to each indecomposable stationary 
distribution, (a) Probability of reaching each identified cluster; these probabilities are estimated by the proportion of trajectories that belong to each 
cluster, (b) First estimated cluster that can be interpreted as a desynchronized population of cells that are dividing, (c) Second estimated cluster, 
corresponding to a fixed point, that can be interpreted as a G1 cell cycle arrest with no growth factors. 
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The next step is to apply BKMC algorithm with MaBoSS 
on other existing large signaling networks, e.g. EGFR path- 
way [30], the apoptosis pathway [31], etc. The translation 
of existing Boolean models in MaBoSS is straightfor- 
ward but requires the addition of transition rates. In 
these future works, we expect to illustrate the advantage 
of BKMC on other simulation algorithms. Moreover, in 
future developments of MaBoSS, we plan to introduce 
methods for sensitivity analyses, refine approximation 
methods used in BKMC, and generalize Markov property. 

We also expect to implement MaBoSS in broadly used 
software environments for Boolean modeling, like GIN- 
sim [8] or CellNetAnalyzer [25]. 

Methods 

BKMC generates stochastic trajectories. In this section, 
we describe how we use and interpret these trajectories. 

Network state probabilities on a time window 

To relate continuous time probabilities to real processes, 
an observable time window At is defined. A discrete time 
(t € N) stochastic process s(r) (that is not necessary 
Markovian) can be extracted from the continuous time 
Markov process: 

i /.(r+l)At 

P [s(r) = S] = — / dtV [5(f) = S] (8) 

BKMC is used for estimating P [sir) = S] as follows: 

1. Estimate for one trajectory. For each trajectory ;', 
compute the time for which the system is in state S, 
in the window [ r At, (r + 1) At]. Divide this time by 
At. Obtain an estimate of P [s(r) = S] for trajectory 

i.e. P, [s(x) = S]. 

2. Estimate for a set of trajectories. Compute the 
average over j of all Pi [sir) = S] to obtain 

P [sir) = S]. Compute the error of this average 

(.yVar(P [sir) = S])/# trajectories). 
Entropies 

Once P [sir) = S] is computed, the entropy Hir) can be 
estimated: 

Hir) = -£ log 2 (P [sir) = S]) P [sir) = S] (9) 
s 

The entropy measures the disorder of the system. Maxi- 
mum entropy means that all states have the same proba- 
bility; a zero entropy means that one of the states has a 
probability of one. The estimation of the entropy can be 
seen as a global characterization of a full probability distri- 
bution by a single real number. The choice of log 2 allows 
the interpretation of Hir ) in an easier manner: 2' H(t) is an 
estimate of the number of states that have a non-negligible 
probability in the time window [ r At, (r + 1) At]. A more 



computer-like interpretation of//(r) is the number of bits 
that are necessary for describing states of non-negligible 
probability. 

The Transition Entropy TH is a finer measure that char- 
acterizes the system at the level of a single trajectory. It 
can be computed in the following way: for each state S, 
there exists a set of possible transitions S —*■ S'. For each 
of these transitions, a probability is associated: 

Ps^s' = ■ (10) 

Z^S" PS^S" 

By convention, Ps^s' = 0 if there is no transition from 
S to any other state. 

Therefore, the transition entropy TH can be associated 
to each state S: 

THiS) = -J2 log 2 (P s ^s')Ps^s' (ID 

S' 

Similarly, THiS) = 0 if there is no transition from S to 
any other state. The transition entropy on a time window 
THir) is defined as: 

THir) = P ls(r) = S] THiS) 
s 

This transition entropy is estimated in the following 
way: 

1. Estimate for one trajectory. For each trajectory ;', 
compute the set <E> of visited states S in the time 
window [ r At, (r + l)Af] and their respective 
duration fi$. The estimated transition entropy is: 

THir) j = Y J THiS) IJ ^ (12) 
Se<t> 

2. Estimate for a set of trajectories. Compute the 
average over; of all THir)j to obtain THir). 
Compute the error of this average 

(,JvariTHir))/# trajectories). 

This transition entropy is a way to measure how deter- 
ministic the dynamics is. If the transition entropy is always 
zero, the system can only make a transition to a given 
state. 

If probability distributions on a time window tend to 
constant values (or tend to a stationary distribution), the 
entropy and the transition entropy can help characterize 
this stationary distribution such that: 

• A fixed point has zero entropy and zero transition 
entropy, 

• A cyclic stationary distribution has non-zero entropy 
and zero transition entropy. 

Entropy and transition entropy can be considered as 
"global characterizations" of the model: for a given time 
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window, they always consist of two real numbers, what- 
ever the size of the network is. 

Hamming distance distribution 

The Hamming Distance between two states S and S' is the 
number of nodes that have different node states between 
S and S': 

H£>(S,S') = £>-a s , s; ) (13) 
i 

where <5 is the Kronecker delta (<5 s . jS j = 1 if S/ = S S . :S '. = 
0 if Si ^ S'X Given a reference state S re f, the Hamming 
distance distribution (over time) is given by: 

P(HD, t) = J2 p = S] WD(S,S ref ) (14) 
s 

The estimation of the Hamming distance distribution 
on a time window P(HD, x) is similar to that of stochastic 
probabilities on a time window. 

The Hamming distance distribution is a useful charac- 
terization when the set of instantaneous probabilities is 
compared to a reference state (S re f). In that case, the Ham- 
ming distance distribution describes how far this set is to 
this reference state. The Hamming distance distribution 
can be considered as a "semi-global" characterization of 
time evolution: for a given time window, the size of this 
characterization is the number of nodes (to be compared 
with probabilities on a time window whose size is 2 #nodes ). 

Input, internal, output and reference nodes 

Input Nodes are defined as the nodes for which the ini- 
tial condition is fixed. Therefore, each trajectory of BKMC 
starts with fixed values of input nodes and random values 
of other nodes. 

Internal nodes are nodes that are not considered for 
computing probability distributions, entropies and tran- 
sition entropies. Output nodes are nodes that are not 
internal. Technically, probabilities are summed up over 
network states that differ only by the state of internal 
nodes. These internal nodes are only used for gener- 
ating time trajectories with BKMC algorithm. Usually, 
nodes are chosen to be internal when the corresponding 
species is not measured experimentally. Mathematically, it 
is equivalent to transform the original Markov process to a 
new stochastic process (that is not necessary Markovian) 
defined on a new network state space. This new state space 
is defined by the states of the output nodes. This raises 
the question of the transition entropy TH: formally, this 
notion has only a sense within Markovian processes, i.e. 
when there are no internal nodes. Here, we generalize the 
notion of transition entropy even in the case of internal 
nodes. Suppose that the system is in state S: 



• If the only possible transitions from state S to any 
other state consist of flipping an internal node, the 
transition entropy is zero. 

• If there is, at least, one transition from state S to 
another state that flips an output node, then only the 
output nodes will be considered for computing 
probabilities in equation 10. In particular, J^s' PS^S' 
is computed only on output node flipping events. 

Reference nodes are nodes for which a reference node 
state is specified and for which the Hamming distance is 
computed. In this framework, a reference state is com- 
posed of reference nodes for which the node state is 
known and non-reference nodes for which the node state 
is unknown. Note that non-reference nodes may differ 
from internal nodes. 

Stationary distribution characterization 

It can be shown (see Additional file 1, "Basic informa- 
tion on Markov process" corollary 2, section 1.2) that 
instantaneous probabilities of a continuous time Markov 
process converge to a stationary distribution. Fixed points 
and cycles are two special cases of stationary distribu- 
tions. They can be identified by the asymptotic behavior 
of entropy and transition entropy (this works only if no 
nodes are internal): 

• If both the transition entropy and the entropy 
converge to zero, then the process converges to a 
fixed point. 

• if the transition entropy converges to zero and the 
entropy does not, then the process converges to a 
cycle. 

More generally, the complete description of the Markov 
process asymptotic behavior can be expressed as a linear 
combination of the indecomposable stationary distribu- 
tions. 

A set of finite trajectories, produced by BKMC, can 
be used to estimate the set of indecomposable station- 
ary distributions. Consider a trajectory S(t), t e[ 0, T\ , i = 
1, • • • ,n. Let Is(t) = 5 s g (() . The estimation of the asso- 
ciated indecomposable stationary probability distribution 
(so) is done by averaging over the whole trajectory: 

P[so = S] = ]= f dthit) (15) 
1 Jo 

Therefore, a set of indecomposable stationary distribu- 
tion estimates can be obtained by a set of trajectories. 
These indecomposable stationary distribution estimates 
should be clustered in groups, where each group con- 
sists of estimates for the same indecomposable stationary 
distribution. For that, we use the fact that two indecom- 
posable stationary distributions are identical if they have 
the same support, i.e. the same set of network states with 
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non-zero probabilities (shown in Additional file 1, "Basic 
information on Markov process" theorem 2, section 1.2). 
Therefore, it is possible to quantify how similar two inde- 
composable stationary distribution estimates are. A sim- 
ilarity coefficient D(Sq\s®) e[0, 1], given two stationary 
distribution estimates and s? , is defined: 



E p[4° = s] 

Vsesupport(4' ) ,s^ ) ) / 
/ 

yS'esupportCsg'.SQ*) 



(16) 



where 



support^ 0 , s|f) 



S such that P 



[4° = s] 



xi>\sf = sl > o} 



(17) 



Clusters can be constructed when a similarity thresh- 
old a is provided. A cluster of stationary distributions is 
defined as follows: 



C = {5 0 | 3s 0 g C s. t.D(s 0 ,s' 0 ) > a] 



(18) 



For each cluster C, a distribution estimate sq, associ- 
ated to an indecomposable stationary distribution, can be 
defined: 



P[s C =S] = ^£p[s = S] 



(19) 



seC 



Errors on this estimate can be computed by: 
Err (P [s c = S]) = VVar(P [s = S] ,s e C)/|C| (20) 

Notice that this clustering procedure has no sense if 
the process is not Markovian; therefore, no nodes are 
considered as internal. 

Additional files 



Additional file 1 : Supplementary material. Basic information on Markov 
process, abbreviations, definitions and algorithms. 

Additional file 2: Model of the mammalian cell cycle with GINsim, 
BoolNet and MaBoSS. The cell cycle presented in the "Examples" section 
has been modeled using three tools: GINsim, BoolNet, and MaBoSS. The 
results for each tool are presented: (1) GINsim provides steady state 
solutions and transition graphs for two different initial conditions: when 
CycD=0 and CycD=1 . For the synchronous strategy, the transition graph 
can be visualized whereas for the asynchronous strategy, it is not easy to 
read or use; BoolNet constructs two graphical representations of the 
trajectories based on synchronous update strategy, for the case of CycD=0 
(steady state) and CycD=l (cycle); (3) MaBoSS estimates indecomposable 
stationary distributions for the case of CycD=0 (one fixed point, not shown) 
and CycD=l (distribution of probabilities of different network states), and 
time-dependent activities of the cyclins showing damped oscillations. All 
results are coherent but are presented differently with a different focus for 
each tool. 



Abbreviations 
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